A Resampling Technique for Relational Data Graphs
نویسندگان
چکیده
Resampling (a.k.a. bootstrapping) is a computationallyintensive statistical technique for estimating the sampling distribution of an estimator. Resampling is used in many machine learning algorithms, including ensemble methods, active learning, and feature selection. Resampling techniques generate pseudosamples from an underlying population by sampling with replacement from a single sample dataset. It is straightforward to sample with replacement from propositional data that are independent and identically distributed (i.i.d.). However, it is not clear how to sample with replacement from an interconnected relational data graph with dependencies among related instances. In this paper, we develop a novel method for resampling from relational data that uses a subgraph sampling approach to preserve the local relational dependencies while generating a pseudosample with sufficient global variance. We evaluate our approach on synthetic data, showing that compared to an i.i.d. resampling approach it results in significantly lower error when used to estimate the variance of feature scores. We also evaluate our approach on a real-world relational classification task, showing that it improves the accuracy of bagging when compared with i.i.d. resampling.
منابع مشابه
Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting
This thesis focuses on two fundamental machine learning problems: unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics, where a large amount of data is available, but no or only a small amoun...
متن کاملResampling in an Indeenite Database to Approximate Functional Dependencies Research Note Rn/98/10
We reintroduce Numerical Dependencies (NDs), deened originally to enhance database design, within a data mining context where we use ND sets to approximate the satisfaction of a given Functional Dependency (FD) set within a relation. We motivate NDs by examining the use of indeenite information in relations. Indeenite information is represented within the relational model by allowing cells to c...
متن کاملInvestigating the Impact of Information Quality on Relationship Marketing with Mediating Role of Salespeople’ Relational Competency: Survey about Iranian ISP
Despite the vital role of information in relational-oriented firms, there are limited studies on the impact of information quality on relationship marketing. To address this gap, this study develops a conceptual model to examine the impact of information quality on the successful implementation of relationship marketing by assessing the mediating role of salespeople's relational competency. The...
متن کاملIMPORTANCE RESAMPLING FOR GLOBAL ILLUMINATION by Justin
IMPORTANCE RESAMPLING FOR GLOBAL ILLUMINATION Justin F. Talbot Department of Computer Science Master of Science This thesis develops a generalized form of Monte Carlo integration called Resampled Importance Sampling. It is based on the importance resampling sample generation technique. Resampled Importance Sampling can lead to significant variance reduction over standard Monte Carlo integration...
متن کامل